fix: add node health check to incr/decr before mutate operation#186
Open
AustinWheel wants to merge 2 commits intomasterfrom
Open
fix: add node health check to incr/decr before mutate operation#186AustinWheel wants to merge 2 commits intomasterfrom
AustinWheel wants to merge 2 commits intomasterfrom
Conversation
incr/decr operations bypassed the ensureWriteQueueSize check that other write paths (set, add, touch) use, causing them to block on unreachable or stalled nodes for the full mutateOperationTimeout duration. A single unhealthy node could cause all incr/decr calls to block, exhausting the caller's thread pool. This adds the same ensureWriteQueueSize gate to incr/decr. If the target node is inactive or its write queue is full, the operation returns -1 immediately, consistent with the documented API contract. The existing reconciliation logic in EVCacheImpl.incr()/decr() already handles -1 as a zone failure and will sync the value from healthy zones on the next successful operation.
shy-1234
reviewed
Mar 5, 2026
… write path - EVCacheImpl.decr: fix reconciliation to pick the minimum non-(-1) value across nodes instead of the maximum. For decr, the most up-to-date node has the lowest value (most decremented), so the old max-pick logic would overwrite correctly decremented nodes with stale higher values. - EVCacheClient.ensureWriteQueueSize: add isAvailable() check before entering the retry/sleep loop. This fast-fails writes to inactive nodes instead of blocking request threads through 3 retry iterations. Affects all write operations that go through ensureWriteQueueSize (incr, decr, set, delete, etc.).
Collaborator
Code reviewNo issues found. Checked for bugs and CLAUDE.md compliance. Reviewed the following changes:
🤖 Generated with Claude Code - If this code review was useful, please react with 👍. Otherwise, react with 👎. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
incr/decr operations bypassed the ensureWriteQueueSize check that other
write paths (set, add, touch) use, causing them to block on unreachable
or stalled nodes for the full mutateOperationTimeout duration. A single
unhealthy node could cause all incr/decr calls to block, exhausting the
caller's thread pool.
This adds the same ensureWriteQueueSize gate to incr/decr. If the target
node is inactive or its write queue is full, the operation returns -1
immediately, consistent with the documented API contract. The existing
reconciliation logic in EVCacheImpl.incr()/decr() already handles -1 as
a zone failure and will sync the value from healthy zones on the next
successful operation.